翻訳と辞書
Words near each other
・ Language isolate
・ Language isolates and independent language families in Arunachal
・ Language lab
・ Language law of Slovakia
・ Language Learning (journal)
・ Language learning strategies
・ Language legislation in Belgium
・ Language localisation
・ Language Log
・ Language Made Plain
・ Language magazine
・ Language Management
・ Language Matters with Bob Holman
・ Language migration
・ Language minority students in Japanese classrooms
Language model
・ Language module
・ Language movement
・ Language Movement Day
・ Language nest
・ Language observatory
・ Language of adoption
・ Language of Angels
・ Language of angels
・ Language of flowers
・ Language of Flowers (band)
・ Language of Jesus
・ Language of Love
・ Language of mathematics
・ Language of Nazi concentration camps


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

Language model : ウィキペディア英語版
Language model
A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length , it assigns a probability P(w_1,\ldots,w_m) to the whole sequence. Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications.
In speech recognition, the computer tries to match sounds with word sequences. The language model provides context to distinguish between words and phrases that sound similar. For example, in American English, the phrases "recognize speech" and "wreck a nice beach" are pronounced almost the same but mean very different things. These ambiguities are easier to resolve when evidence from the language model is incorporated with the pronunciation model and the acoustic model.
Language models are used in information retrieval in the query likelihood model. Here a separate language model is associated with each document in a collection. Documents are ranked based on the probability of the query ''Q'' in the document's language model P(Q\mid M_d). Commonly, the unigram language model is used for this purpose—otherwise known as the bag of words model.
Data sparsity is a major problem in building language models. Most possible word sequences will not be observed in training. One solution is to make the assumption that the probability of a word only depends on the previous ''n'' words. This is known as an ''n''-gram model or unigram model when ''n'' = 1.
== Unigram models ==
A unigram model used in information retrieval can be treated as the combination of several one-state finite automata.〔Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: An Introduction to Information Retrieval, pages 237–240. Cambridge University Press, 2009〕 It splits the probabilities of different terms in a context, e.g. from P(t_1t_2t_3)=P(t_1)P(t_2\mid t_1)P(t_3\mid t_1t_2) to P_\text(t_1t_2t_3)=P(t_1)P(t_2)P(t_3).
In this model, the probability to hit each word all depends on its own, so we only have one-state finite automata as units. For each automaton, we only have one way to hit its only state, assigned with one probability. Viewing from the whole model, the sum of all the one-state-hitting probabilities should be 1. Followed is an illustration of a unigram model of a document.
: \sum_) = 1 \,
The probability generated for a specific query is calculated as
: P(\text) = \prod_)
For different documents, we can build their own unigram models, with different hitting probabilities of words in it. And we use probabilities from different documents to generate different hitting probabilities for a query. Then we can rank documents for a query according to the generating probabilities. Next is an example of two unigram models of two documents.
In information retrieval contexts, unigram language models are often smoothed to avoid instances where ''P''(term) = 0. A common approach is to generate a maximum-likelihood model for the entire collection and linearly interpolate the collection model with a maximum-likelihood model for each document to create a smoothed document model.〔Buttcher, Clarke, and Cormack. Information Retrieval: Implementing and Evaluating Search Engines. pg. 289–291. MIT Press.〕

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「Language model」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.